Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes
نویسنده
چکیده
Reinforcement learning (RL) has become a central paradigm for solving learning-control problems in robotics and artificial intelligence. R L researchers have focussed almost exclusively on problems where the controller has to maximize the discounted sum of payoffs. However, as emphasized by Schwartz (1$X)3), in many problems, e.g., those for which the optimal behavior is a limit cycle, it is more natural and computationally adva.ntageous to formulatAe tasks so that the controller’s objective is to ma.ximize the avera.ge payoff received per time step. In this paper I derive new average-payofl RL algorithms as stochastic approximation methods for solving the system of equations associated with the policy evctl~~tiot~ and optimal control questions in avera.ge-payoff RL tasks. These algorithms are analogous to the popular TD and Q-learning a.lgorithms a.lready developed for the discounted-payoff case. One of the a.lgorit.hms clerived here is a significant variation of Schwartz’s R-lea.rning algorithni. Prelimina.ry empirica results arc presented to validate these new algorithms.
منابع مشابه
Non-Deterministic Policies In Markovian Processes
Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct adaptive treatment strategies, where a sequence of individualized treatments is learned from clinic...
متن کاملHuman learning in non-Markovian decision making
Humans can learn under a wide variety of feedback conditions. Particularly important types of learning fall under the category of reinforcement learning (RL) where a series of decisions must be made and a sparse feedback signal is obtained. Computational and behavioral studies of RL have focused mainly on Markovian decision processes (MDPs), where the next state and reward depends only on the c...
متن کاملLearning Without State-Estimation in Partially Observable Markovian Decision Processes
Reinforcement learning RL algorithms pro vide a sound theoretical basis for building learning control architectures for embedded agents Unfortunately all of the theory and much of the practice see Barto et al for an exception of RL is limited to Marko vian decision processes MDPs Many real world decision tasks however are inherently non Markovian i e the state of the environ ment is only incomp...
متن کاملA Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon
Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in innnite-horizon when no model is available. In this article we consider the particular framework of non-stationary nite-horizon Markov Decision Processes. After establishing a relationship between the nite-horizon total reward criterion and the avera...
متن کاملNon-Deterministic Policies in Markovian Decision Processes
Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision-making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct decision support systems for action selection in Markovian environments. Although conventional meth...
متن کامل